To appear, Physica A 

Computational Models of Adult Neurogenesis 

Guillermo A. Cecchi 
T.J. Watson IBM Research Center, Yorktown Heights, NY, USA 

Marcelo O. Magnasco 
The Rockefeller University, New York City, NY, USA 
(Dated: February 9, 2008) 

Abstract 

Experimental results in recent years have shown that adult neurogenesis is a significant phe- 
nomenon in the mammalian brain. Little is known, however, about the functional role played by 
the generation and destruction of neurons in the context of and adult brain. Here we propose 
two models where new projection neurons are incorporated. We show that in both models, using 
incorporation and removal of neurons as a computational tool, it is possible to achieve a higher 
computational efficiency that in purely static, synapse-learning driven networks. We also discuss 
the implication for understanding the role of adult neurogenesis in specific brain areas. 
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I. ADULT NEUROGENESIS AND COMPUTATION 



Adult neurogenesis (AN), that is, the incorporation of new neurons in adult brains, has 
been documented in the mammalian olfactory bulb and the dentate gyrus, as well as in in 
a variety of subcortical and different of the neocortex, but with less significance 
fl fl ff A „u„*e. of ob.e„at,o,. pent to a f„„c«o„a> .o.e fo. AN, ,„ pa.t,c„.a. tKe 
increase in proliferation after exposure to novel environments ^ ^, and the incorporation 
of new neurons as functionally connected elements 0. We have recently introduced the 
first computational model of adult neurogenesis in the olfactory bulb 7|, showing that 
the death and incorporation of inhibitory (granule) interneurons is enough to achieve the 
orthogonalization of an input ensemble, a task the olfactory bulb is assumed to be involved 
in ^. The model predicted an initial wave of massive cell death for newly incorporated 
neurons, followed by a constant, small and protracted background of death. Interestingly, 
these qualitative predictions were subsequently confirmed by experiments ^. This model, 
however, was not able to account for the possible computational necessity of AN, given 
that similar results can be obtained by synaptic learning. We are nevertheless interested 
in proving the hypothesis that AN is a required solution to specific problems found by 
evolution, as opposed to an evolutionary vagary. We present in this letter two models that 
provide evidence for this hypothesis. We will discuss first a model in which the replacement 
of neurons is based on their level of activity, displaying interesting information-theoretic 
properties. We will later introduce a model in which the replacement is based on a measure 
of correlation between neighboring neurons; this second model shows interesting properties 
related to the convergence to minimal distortion of neural maps. 



II. REPLACEMENT MODELS 



The activity-based replacement model consists of a network of projection units that learn 
to represent an input ensemble based on a winner-take-all approach (WTA). This model is 
related to the Growing Cell Structure (GCS) algorithm introduced by Fritzke but unlike 
it has a simple physiological interpretation. The GCS approach requires the incorporation 
of new neurons with specific synaptic weights, derived from those of already incorporated 
neurons; in our algorithm, this " teleological" requirement is not present. 
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The training proceeds as follows: upon presentation of an input X„, a winner unit is 
selected such that, = miuj |Wi — X„|. The winner and the runner-ups are updated 
following a standard Hebbian rule: AW^ oc X^-W^, and AW^ oc (Wi-Wfe)/(|Wfc-Wi|), 
where /() is a non-linear decreasing function of its argument; here we will choose and 
exponential. We introduce here the novel component of the algorithm. First, a winning 
rate is computed for each unit, — 1 if i — k, otherwise, =^ cui — {wi)n, intended 
to capture the unit's recent activity. Based on this value, the units are replaced. This is 
done probabilistically, p(death)j oc l—uji. At the same time, new neurons arrive constantly, 
with a probability p{N + 1) = A. The newly arrived units are not immediately connected: 
they are allowed to participate of the competition process, but do not become part of the 
representation ensemble until a trial period r has passed. In this way, the number of output 
units is always bounded, and new neurons are allowed to find a good configuration. 

The results of the simulation are shown in Fig 1-A,B. Different input ensembles consisting 
of 100 elements are drawn from a uniform distribution on the unit lOOD hyper-sphere. The 
network is initialized with an arbitrary number of output units (50[o] and 150[o] in this case), 
and is trained with a random sequence of input exemplars. As small amount of noise is added 
to the input exemplars, but this is not a determining factor. Panel A shows the evolution 
of the mutual information between the input and the output ensembles, relative to the 
maximal mutual entropy (the entropy of the input in this case). The mutual information 
here is measured as in a discrete channel, using only the output units, as i?(X, W) = 
H{W) — i?(W|X). We observe that indeed the network evolves towards maximal mutual 
information, for both low and high initial number of units. This is expected of a network 
implementing the Hebbian component of the algorithm. Panel B is however more surprising: 
it shows that the network evolves towards a number of units equal to the number of input 
clusters, independently of the initial number of units. This is indeed a robust feature, 
independent also of the total number of input clusters. 

Given that mutual information maximization can be equally achieved by a non-modifying 
Hebbian network, what is the advantage of implementing a network like this one? A key 
point here is the ability of the network to track the number of input clusters in the number 
of output units. A simple analysis of a the discrete channel defined by the network shows 
that, for a given number of inputs M and network size N, the maximal mutual information 
normalized by the input entropy scales like log N for N < M, and is 1 when N > M. At the 
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FIG. 1: A: Mutual Information of activity-based model. B: Total number of output units in 
the same model; x-axes are iterations xlO^. C: Distortion averaged over initial configurations 
for correlation-based model (solid line), and simple elastic model (dotted line). Dashes are one 
standard deviation, and the x-axis is iterations. D: Average replacement rate for the correlation- 
based model 



same time, the redundancy of the output, defined as if(W|X) is when N < M, to then 
scale as log — log M. This shows that a WTA-like network with the same number of units 
as number of inputs in the input ensemble (the cardinality) can at the same time maximize 
the mutual information and minimize the output redundancy. Both features are relevant in 
the context of brain processing in particular, minimizing the total number of outputs 
without compromising the mutual information can reduce the wiring and crosstalk in the 
target network, if at the expense of maintaining a costly mechanism like AN, and in the 
context of an unknown and possibly changing cardinality of the input ensemble. 
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The model presented above shows one possible advantage of combining classical synaptic 
learning with AN. One limitation of this model is the the task required from the network 
is fairly simple, and lacking therefore generality. In what follows we will present a different 
model, in which the replacement of units is driven by correlated activity, and implementing 
a less simple form of computation, i.e. topology-preserving mapping. This correlation- 
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based replacement model is based on the elastic net introduced in 
network is to represent the input space in a topographically order fashion, such that nearby 
units in neural space have similar receptive characteristics. The network differs from the 
previous one in that no winner needs to be computed, and neighboring units influence each 
other. Formally, the update rule is: AWj = aAj(X — Wj) + /3/t(Wj — Wi)h{dij), where 
Ai = e~''"^"'^'''^/^'^^/A/', and A/" = e^*^-^^'^^^^/^'^^, can be interpreted as a normalized 
activity, dij is the distance between units i and j in neural space, and h{) a monotonically 
decreasing function of its argument. In many cases, as in here, a simpler version is used 
where only the immediate first neighbors of each unit are considered. In this case, the 
second term of the update equation reads /3(Wj+i + Wj_i — 2Wi). It is interesting to notice 
that the update equation minimizes an energy function than can be computed by simple 
integration, E = -a/t ^„ log e"^-^"""^*)^/^"^ - PY^ii^i+i ~ W^)^. The elastic net has 
been used in a variety of optimization problems, like the travelling salesman problem, and 
also in simulations of cortical maps. It is well known, however, that as many optimization 
algorithms, the elastic net can get trapped in local minima depending on the complexity 
of the input space. To illustrate this phenomenon, we implemented and elastic net of 100 
units tasked with learning an input space consisting of 100 exemplars distributed on a two 
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dimensional space as i/i = 50 sin(xj7r/35 + 7r/2) + 50. Without scheduling of k 
elastic net can find an optimal solution only in less the 1% of initial conditions (Fig. 2). 
In contrast, the same network, with the addition of the correlation-based replacement, is 
able to find the optimal solution under any initial condition. The algorithm is as follows: 
each unit computes a correlation-based or "stress" variable, whose evolution is defined as 
Siii) = — nsi{t), where 7 and /i are arbitrary parameters, and the 

initial condition for new neurons is Sj(t)|„e^ = 0. When the "stress" reaches a threshold, 
the unit is "replaced" by a new one with an arbitrary distribution of synaptic weights. 
Si > St ^ reset[Wi\. The results of the simulation are presented in Panel C, where the 
ensemble average over initial conditions shows a linearly decreasing behavior in semilog scale. 
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Although the individual evolution for different initial configurations can vary dramatically, 
for our toy example they all converge to minimal distortion in finite time, in sharp contrast 
with the pure elastic net algorithm. Panel D shows the evolution of the replacement rate, 
defined as the number of replaced units per cycle. It displays a fast (power-law) initial decay, 
followed by a long exponential tail; interestingly, the same qualitative features are observed 
both experimentally and computationally in the olfactory bulb, as mentioned above. 

In summary, we have shown two novel mechanisms that suggest that AN may be a 
necessary computational tool used by brain structures whenever (a) an input ensemble of 
discrete elements and non-stationary cardinality needs to be processed (the activity-based 
model), or (b) a topographic map of a complex input space needs to be formed with minimal 
distortion (the correlation-based model). The first feature maybe relevant for the neural 
processing required by the High Vocal Center, a brain area of songbrids were replacement 
correlates with song modification P]. The second feature maybe important for the generation 
of an adaptive neural map in the Dentate Gyrus (DG) of the Hippocampus. We can only 
speculate at this point, but the local circuitry of the DG is complex enough to support a 
physiological implementation of the correlation-based algorithm '1^, and the correlation of 
replacement with exposure to novel environments 4] is compatible with our model. 
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